A Knowledge-Base Oriented Approach for Automatic Keyword Extraction
نویسندگان
چکیده
Automatic keyword extraction is an important subfield of information extraction process. It is a difficult task, where numerous different techniques and resources have been proposed. In this paper, we propose a generic approach to extract keyword from documents using encyclopedic knowledge. Our two-step approach first relies on a classification step for identifying candidate keywords followed by a learning-to-rank method depending on a user-defined keyword profile to order the candidates. The novelty of our approach relies on i) the usage of the keyword profile ii) generic features derived from Wikipedia categories and not necessarily related to the document content. We evaluate our system on keyword datasets and corpora from standard evaluation campaign and show that our system improves the global process of keyword extraction.
منابع مشابه
Automatic keyword extraction using Latent Dirichlet Allocation topic modeling: Similarity with golden standard and users' evaluation
Purpose: This study investigates the automatic keyword extraction from the table of contents of Persian e-books in the field of science using LDA topic modeling, evaluating their similarity with golden standard, and users' viewpoints of the model keywords. Methodology: This is a mixed text-mining research in which LDA topic modeling is used to extract keywords from the table of contents of sci...
متن کاملAn Ontology Based Automatic Annotation and Semantic Information Retrieval from Tamil Documents
The use of Ontologies to overcome the limitations of keyword-based search has been put forward as one of the motivations of the Semantic IR. We propose a model, which includes an ontology-based scheme for the automatic annotation of Tamil documents and a retrieval system. The retrieval model is based on an adaptation of the classic vector-space model, including an annotation weighting algorithm...
متن کاملMultifaceted Entity/Fact/Relation Retrieval via Semantic Search Interface based on Domain Knowledge Extraction
A significant kind of information search on the Web is concerned with finding entities of a specific type that satisfy certain semantic conditions. Nevertheless, the conventional keyword-based search mechanism commonly found on the Web does not enable the user to specify the semantic type/conditions for the entities sought, and accordingly does not return the entities as direct search results. ...
متن کاملAdvertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles
When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...
متن کاملObject-Oriented Method for Automatic Extraction of Road from High Resolution Satellite Images
As the information carried in a high spatial resolution image is not represented by single pixels but by meaningful image objects, which include the association of multiple pixels and their mutual relations, the object based method has become one of the most commonly used strategies for the processing of high resolution imagery. This processing comprises two fundamental and critical steps towar...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013